Statistical Methods For Retrieving Most Significant Paragraphs In Newspaper Articles
نویسندگان
چکیده
Retrieving a most stgulficant paragraph m a newspaper arUcle can act as a kind of surnmanzatmn It can gwe the human reader some hints on the contents of the arucle and help him to decide whether It deseei'ves a full readmg or not It may also act as a filter for a robust natural language understanding system, to extract relevant mformatton from that paragraph m order to enable conceptual mformauon retrieval Talang a newspaper arUcle and a base corpus, word co-occurrences w3th higher resolving power are ~dent~fied These co-occurrences are used to estabhsh hnks between the paragraphs of the arUcle The paragraph which presents the larger number of hnks tO other paragraphs ~s considered a most slgmficant one Though designed and tested for the Portuguese language, the staUshcal nature of our proposal should ensure ns portabtlny to other languages
منابع مشابه
Using lexical chains to build hypertext links in newspaper articles
We discuss an automatic method for the construction of hypertext links within and between newspaper articles. The method comprises three steps: determining the lexical chains in a text, building links between the paragraphs of articles, and building links between articles. Lexical chains capture the semantic relations between words that occur throughout a text. Each chain is a set of related wo...
متن کاملBuilding hypertext links in newspaper articles using semantic similarity
We discuss an automatic method for the construction of hypertext links within and between newspaper articles. The method comprises three steps: determining the lexical chains in a text, building links between the paragraphs of articles, and building links between articles. Lexical chains capture the semantic relations between words that occur throughout a text. Each chain is a set of related wo...
متن کاملA Comparative Study of Topic Identification on Newspaper and E-mail
This paper presents several statistical methods for topic identification on two kinds of textual data: newspaper articles and e-mails. Five methods are tested on these two corpora: topic unigrams, cache model, TFIDF classifier, topic perplexity, and weighted model. Our work aims to study these methods by confronting them to very different data. This study is very fruitful for our research. Stat...
متن کاملAutomatically generating hypertext in newspaper articles by computing semantic relatedness
We discuss an automatic method for the construction of hypertext links within and between newspaper articles. The method comprises three steps: determining the lexical chains in a text, building links between the paragraphs of articles, and building links between articles. Lexical chains capture the semantic relations between words that occur throughout a text. Each chain is a set of related wo...
متن کاملExtraction and Visualization of Trend Information from Newspaper Articles and Blogs
Trend information is a summarization of temporal statistical data, such as changes in product prices and sales. We propose a method for extracting trend information from multiple newspaper articles and blogs, and visualizing the information as graphs. As target texts for extraction of trend information, the MuST (Multimodal Summarization for Trend Information) workshop focuses on newspaper arti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997